Goto

Collaborating Authors

 semantic reasoning


ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents

Kadu, Ankush, Krishnan, Ashwanth

arXiv.org Artificial Intelligence

Enabling agents to learn from experience and generalize across diverse tasks without task-specific training remains a fundamental challenge in reinforcement learning and decision-making. While recent approaches have explored episodic memory (Reflexion), gradient-based prompt optimization (TextGrad),and hierarchical task decomposition independently, their potential for synergistic integration remains unexplored. We introduce ReflexGrad, a novel architecture that tightly couples three complementary mechanisms: (1) LLM-based hierarchical TODO decomposition for strategic planning, (2) history-aware causal reflection that analyzes recent action patterns to identify failure root causes and enable within-trial learning, and (3) gradient-based optimization for systematic improvement. Unlike prior work relying on few-shot demonstrations, our system achieves true zero-shot generalization through pure LLM semantic reasoning,requiring no task-specific examples, fine-tuning, or hardcoded similarity metrics. Evaluated on ALFWorld benchmark tasks, ReflexGrad demonstrates 67% zero-shot success rate on Trial 0 without any prior task experience or demonstrations, establishing effective performance on first exposure. Through empirical analysis, we identify the architectural mechanisms underlying stable convergence (zero action loops) and effective cross-task transfer (67% to 78% improvement).Our work demonstrates that synergistic integration of complementary learning mechanisms enables robust zero-shot generalization that approaches few-shot baselines from prior work.


WAR-Re: Web API Recommendation with Semantic Reasoning

Xu, Zishuo, Yao, Dezhong, Wan, Yao

arXiv.org Artificial Intelligence

With the development of cloud computing, the number of Web APIs has increased dramatically, further intensifying the demand for efficient Web API recommendation. Despite the demonstrated success of previous Web API recommendation solutions, two critical challenges persist: 1) a fixed top-N recommendation that cannot accommodate the varying API cardinality requirements of different mashups, and 2) these methods output only ranked API lists without accompanying reasons, depriving users of understanding the recommendation. To address these challenges, we propose WAR-Re, an LLM-based model for Web API recommendation with semantic reasoning for justification. WAR-Re leverages special start and stop tokens to handle the first challenge and uses two-stage training: supervised fine-tuning and reinforcement learning via Group Relative Policy Optimization (GRPO) to enhance the model's ability in both tasks. Comprehensive experimental evaluations on the ProgrammableWeb dataset demonstrate that WAR-Re achieves a gain of up to 21.59\% over the state-of-the-art baseline model in recommendation accuracy, while consistently producing high-quality semantic reasons for recommendations.


Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning

Li, Xun, Cruz, Rodrigo Santa, Xi, Mingze, Zhang, Hu, Perera, Madhawa, Wang, Ziwei, Ravendran, Ahalya, Matthews, Brandon J., Xu, Feng, Adcock, Matt, Wang, Dadong, Liu, Jiajun

arXiv.org Artificial Intelligence

To enable robots to comprehend high-level human instructions and perform complex tasks, a key challenge lies in achieving comprehensive scene understanding: interpreting and interacting with the 3D environment in a meaningful way. This requires a smart map that fuses accurate geometric structure with rich, human-understandable semantics. To address this, we introduce the 3D Queryable Scene Representation (3D QSR), a novel framework built on multimedia data that unifies three complementary 3D representations: (1) 3D-consistent novel view rendering and segmentation from panoptic reconstruction, (2) precise geometry from 3D point clouds, and (3) structured, scalable organization via 3D scene graphs. Built on an object-centric design, the framework integrates with large vision-language models to enable semantic queryability by linking multimodal object embeddings, and supporting object-level retrieval of geometric, visual, and semantic information. The retrieved data are then loaded into a robotic task planner for downstream execution. We evaluate our approach through simulated robotic task planning scenarios in Unity, guided by abstract language instructions and using the indoor public dataset Replica. Furthermore, we apply it in a digital duplicate of a real wet lab environment to test QSR-supported robotic task planning for emergency response. The results demonstrate the framework's ability to facilitate scene understanding and integrate spatial and semantic reasoning, effectively translating high-level human instructions into precise robotic task planning in complex 3D environments.


CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning

Roy, Monoshi Kumar, Chen, Simin, Steenhoek, Benjamin, Peng, Jinjun, Kaiser, Gail, Ray, Baishakhi, Le, Wei

arXiv.org Artificial Intelligence

Understanding and reasoning about code semantics is essential for enhancing code LLMs' abilities to solve real-world software engineering (SE) tasks. Although several code reasoning benchmarks exist, most rely on synthetic datasets or educational coding problems and focus on coarse-grained reasoning tasks such as input/output prediction, limiting their effectiveness in evaluating LLMs in practical SE contexts. To bridge this gap, we propose CodeSense, the first benchmark that makes available a spectrum of fine-grained code reasoning tasks concerned with the software engineering of real-world code. We collected Python, C and Java software projects from real-world repositories. We executed tests from these repositories, collected their execution traces, and constructed a ground truth dataset for fine-grained semantic reasoning tasks. We then performed comprehensive evaluations on state-of-the-art LLMs. Our results show a clear performance gap for the models to handle fine-grained reasoning tasks. Although prompting techniques such as chain-of-thought and in-context learning helped, the lack of code semantics in LLMs fundamentally limit models' capabilities of code reasoning. Besides dataset, benchmark and evaluation, our work produced an execution tracing framework and tool set that make it easy to collect ground truth for fine-grained SE reasoning tasks, offering a strong basis for future benchmark construction and model post training. Our code and data are located at https://codesense-bench.github.io/.


SLAM-Free Visual Navigation with Hierarchical Vision-Language Perception and Coarse-to-Fine Semantic Topological Planning

Zhao, Guoyang, Li, Yudong, Qi, Weiqing, Zhang, Kai, Liu, Bonan, Chen, Kai, Li, Haoang, Ma, Jun

arXiv.org Artificial Intelligence

Abstract-- Conventional SLAM pipelines for legged robot navigation are fragile under rapid motion, calibration demands, and sensor drift, while offering limited semantic reasoning for task-driven exploration. T o deal with these issues, we propose a vision-only, SLAM-free navigation framework that replaces dense geometry with semantic reasoning and lightweight topological representations. And a semantic-probabilistic topological map supports coarse-to-fine planning: LLM-based global reasoning for subgoal selection and vision-based local planning for obstacle avoidance. Integrated with reinforcement-learning locomotion controllers, the framework is deployable across diverse legged robot platforms. Experiments in simulation and real-world settings demonstrate consistent improvements in semantic accuracy, planning quality, and navigation success, while ablation studies further showcase the necessity of both hierarchical perception and fine local planning. This work introduces a new paradigm for SLAM-free, vision-language-driven navigation, shifting robotic exploration from geometry-centric mapping to semantics-driven decision making. Autonomous exploration and navigation remain fundamental challenges for mobile robots in open and unstructured environments.


Nav-R1: Reasoning and Navigation in Embodied Scenes

Liu, Qingxiang, Huang, Ting, Zhang, Zeyu, Tang, Hao

arXiv.org Artificial Intelligence

"The division of labor between System 1 (fast) and System 2 (slow) is highly efficient: it minimizes effort and optimizes performance. Nav-R1 is an embodied foundation model that integrates dialogue, reasoning, planning, and navigation capabilities to enable intelligent interaction and task execution in 3D environments. Abstract-- Embodied navigation requires agents to integrate perception, reasoning, and action for robust interaction in complex 3D environments. Existing approaches often suffer from incoherent and unstable reasoning traces that hinder generalization across diverse environments, and difficulty balancing long-horizon semantic reasoning with low-latency control for real-time navigation. T o address these challenges, we propose Nav-R1, an embodied foundation model that unifies reasoning in embodied environments. We first construct Nav-CoT -110K, a large-scale dataset of step-by-step Chains-of-Thought (CoT) for embodied tasks, which enables cold-start initialization with structured reasoning. Building on this foundation, we design a GRPO-based reinforcement learning framework with three complementary rewards: format, understanding, and navigation, to improve structural adherence, semantic grounding, and path fidelity. Furthermore, we introduce a Fast-in-Slow reasoning paradigm, decoupling deliberate semantic reasoning from low-latency reactive control for efficient yet coherent navigation.


SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs

Li, Fengze, Wang, Yue, Liu, Yangle, Huang, Ming, Hong, Dou, Ma, Jieming

arXiv.org Artificial Intelligence

Multivariate time series forecasting requires models to simultaneously capture variable-wise structural dependencies and generalize across diverse tasks. While structural encoders are effective in modeling feature interactions, they lack the capacity to support semantic-level reasoning or task adaptation. Conversely, large language models (LLMs) possess strong generalization capabilities but remain incompatible with raw time series inputs. This gap limits the development of unified, transferable prediction systems. Therefore, we introduce SEED, a structural encoder for embedding-driven decoding, which integrates four stages: a token-aware encoder for patch extraction, a projection module that aligns patches with language model embeddings, a semantic reprogramming mechanism that maps patches to task-aware prototypes, and a frozen language model for prediction. This modular architecture decouples representation learning from inference, enabling efficient alignment between numerical patterns and semantic reasoning. Empirical results demonstrate that the proposed method achieves consistent improvements over strong baselines, and comparative studies on various datasets confirm SEED's role in addressing the structural-semantic modeling gap.


TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking

Hang, Ching Nam, Yu, Pei-Duo, Tan, Chee Wei

arXiv.org Artificial Intelligence

By effectively merging these two retrieval paradigms, the system would be capable of assembling a more comprehensive evidence base, thereby reducing the likelihood of missing pertinent details. Additionally, incorporating incremental graph update techniques would enable TrumorGPT to seamlessly integrate new medical studies and real-time health news without the need for extensive re-indexing or system downtime. This continuous update process is particularly crucial in the dynamic field of health, where the rapid emergence of new data can significantly impact the accuracy of fact-checking outcomes. In addition to efficient updating, implementing a dual-level retrieval strategy can further enhance contextual reasoning. Under this strategy, an initial coarse-grained retrieval would rapidly identify broad thematic and relational contexts, while a subsequent fine-grained search would extract specific factual details. This layered retrieval approach not only ensures that both high-level and granular information is captured but also supports more robust multi-hop reasoning by effectively bridging the gap between abstract concepts and concrete facts. Thus, these enhancements would bolster the fact-checking framework of TrumorGPT, striking an optimal balance between precision, efficiency, and comprehensive reasoning.


The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare

Pashangpour, Souren, Nejat, Goldie

arXiv.org Artificial Intelligence

The potential use of large language models (LLMs) in healthcare robotics can help address the significant demand put on healthcare systems around the world with respect to an aging demographic and a shortage of healthcare professionals. Even though LLMs have already been integrated into medicine to assist both clinicians and patients, the integration of LLMs within healthcare robots has not yet been explored for clinical settings. In this perspective paper, we investigate the groundbreaking developments in robotics and LLMs to uniquely identify the needed system requirements for designing health specific LLM based robots in terms of multi modal communication through human robot interactions (HRIs), semantic reasoning, and task planning. Furthermore, we discuss the ethical issues, open challenges, and potential future research directions for this emerging innovative field.


Minds, Brains, AI

Seitz, Jay

arXiv.org Artificial Intelligence

In the last year or so (and going back many decades) there has been extensive claims by major computational scientists, engineers, and others that AGI (artificial general intelligence) is 5 or 10 years away, but without a scintilla of scientific evidence, for a broad body of these claims: Computers will become conscious, have a "theory of mind," think and reason, will become more intelligent than humans, and so on. But the claims are science fiction, not science. This article reviews evidence for the following three (3) propositions using extensive body of scientific research and related sources from the cognitive and neurosciences; evolutionary evidence; linguistics; data science; comparative psychology; self-driving cars, and robotics; and the learning sciences.